Date: Thu Mar 26 04:36:03 2020
Scientist: Ran Yin
Sequencing (Waksman): Dibyendu Kumar
Statistics: Davit Sargsyan
Principal Investigator: Ah-Ng Kong

# Taxonomic Ranks:
# **K**ing **P**hillip **C**an n**O**t **F**ind **G**reen **S**ocks
# * Kingdom                
# * Phylum                    
# * Class                   
# * Order                   
# * Family     
# * Genus     
# * Species  
# options(stringsAsFactors = FALSE,
#         scipen = 999)
# # Increase mmemory size to 64 Gb----
# invisible(utils::memory.limit(65536))
# str(knitr::opts_chunk$get())
# # NOTE: the below does not work!
# knitr::opts_chunk$set(echo = FALSE, 
#                       message = FALSE,
#                       warning = FALSE,
#                       error = FALSE)
# require(knitr)
# require(kableExtra)
# require(shiny)
require(phyloseq)
Loading required package: phyloseq
require(data.table)
Loading required package: data.table
data.table 1.12.2 using 18 threads (see ?getDTthreads).  Latest news: r-datatable.com
require(ggplot2)
Loading required package: ggplot2
require(plotly)
Loading required package: plotly

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout
require(DT)
Loading required package: DT
require(lmerTest)
Loading required package: lmerTest
Loading required package: lme4
Loading required package: Matrix

Attaching package: ‘lmerTest’

The following object is masked from ‘package:lme4’:

    lmer

The following object is masked from ‘package:stats’:

    step
require(nnet)
Loading required package: nnet
source("source/functions_may2019.R")
# On Windows set multithread=FALSE----
mt <- TRUE

1 Introduction

C57BL/6 wild-type (WT) and Nrf-2 double-knock-out (KO -/-) mice were given 2-week microbiome stabilization process using AIN93M diet and 8 more weeks to treat with either AIN93M or AIN93M 5% PEITC diet. Fecal samples were collected weekly, immediately frozen in liquid nitrogen and stored at -80oC. Serum, cecal, colon epithelial and whole colon tissues at week 10 were also collected for further analyses. Baseline, week 1 and 4 fecal samples were selected for 16s rRNA sequencing.

This document examines results from the WT mice samples.

We will attampt to answer the following questions:
1. Did microbiome change over time?
2. Was microbiome affected by diet?
3. Was there a difference between the KO and WT?
4. If there was a change in microbiome composition, what functional changes did it carry? What are the essential functions of the bacteria affected by the treatment and how can this be shown in vivo (metabolites, inflammation markers, etc.)?

2 Data preprocessing

2.1 Raw Data

FastQ files were downloaded from this Rutgers Box location. A total of 144 files (2 per sample, pair-ended) and a pair of undetermined reads were downloaded.

2.2 Script

This script (nrf2ubiome_dada2_sep2019_v1.Rmd) was developed using DADA2 Pipeline Tutorial (1.12) with tips and tricks from the University of Maryland Shool of Medicine Institute for Genome Sciences (IGS) Microbiome Analysis Workshop (April 8-11, 2019). The output of the DADA2 script (data_may2019/ps_sep2019.RData) is explored in this document.

3 Meta data: sample description

# Load data----
# Counts
load("data_sep2019/ps_sep2019.RData")
# Taxonomy
load("data_sep2019/taxa.RData")
taxa <- data.table(seq16s = rownames(taxa),
                   taxa)

NOTE: correction to the meta-data! (11/15/2019)

correct_samples <- fread("data_sep2019/16s metadata Sep-2019.csv")
ps_sep2019@sam_data$DSS <- correct_samples$DSS

4 Samples

ps_sep2019@sam_data$Genotype_Week <- paste(ps_sep2019@sam_data$genotype,
                                           ps_sep2019@sam_data$time,
                                           sep = "_")
ps_sep2019@sam_data$ID <- factor(paste0(ps_sep2019@sam_data$mice_num,
                                        ps_sep2019@sam_data$cage))
ps_sep2019@sam_data$TREATMENT <- paste0(ps_sep2019@sam_data$DSS,
                                        ps_sep2019@sam_data$PEITC,
                                        ps_sep2019@sam_data$cranberry)
ps_sep2019@sam_data$TREATMENT <- factor(ps_sep2019@sam_data$TREATMENT,
                                        levels = c("000",
                                                   "100",
                                                   "110",
                                                   "101"),
                                        labels = c("Naive",
                                                   "DSS",
                                                   "DSS+PEITC",
                                                   "DSS+Cranberry"))
samples <- ps_sep2019@sam_data
datatable(samples,
          options = list(pageLength = nrow(samples)))

5 Prune data

The OTUs were mapped to Bacteria (96.07%), Eukaryota (2.95%) and Archea (0.03%) kingdoms, and 75 OTUs (0.95%) undefined.

The total of 7,867 unique sequences were found. Out of those, 7,558 were mapped to bacterial genomes.

dim(ps_sep2019@otu_table@.Data)

# Remove OTU not mapped to Bacteria
ps0 <- subset_taxa(ps_sep2019, 
                   Kingdom == "Bacteria")
dim(ps0@otu_table@.Data)

Out of the 7,558 OTUs 7,247 belonged to 12 Phyla. 311 of the OTUs (or 4.11% of bacterial OTUs) could not be mapped to a phylum.

t2 <- data.table(table(tax_table(ps0)[, "Phylum"],
                                  exclude = NULL))
t2$V1[is.na(t2$V1)] <- "Unknown"
setorder(t2, -N)
t2[, pct := N/sum(N)]
setorder(t2, -N)
colnames(t2) <- c("Phylum",
                  "Number of OTUs",
                  "Percent of OTUs")
datatable(t2,
          rownames = FALSE,
          caption = "Number of Bacterial OTUs by Phylum",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(t2))) %>%
  formatCurrency(columns = 2,
                 currency = "",
                 mark = ",",
                 digits = 0) %>%
  formatPercentage(columns = 3,
                   digits = 2)

6 OTU table (first 10 rows)

7 Total counts per sample (i.e. sequencing depth)

t1 <- colSums(otu[, 7:ncol(otu)])
t1 <- data.table(SAMPLE_NAME = names(t1),
                 Total = t1)

t2 <- data.table(SAMPLE_NAME = rownames(samples),
                 ID = samples$ID,
                 CAGE = samples$cage,
                 TREATMENT = samples$TREATMENT,
                 Genotype = samples$genotype,
                 WEEK = samples$time)

smpl <- merge(t1,
              t2,
              by = "SAMPLE_NAME")

p1 <- ggplot(smpl,
             aes(x = SAMPLE_NAME,
                 y = Total,
                 fill = TREATMENT,
                 colour = WEEK)) +
  facet_wrap(~ Genotype,
             scale = "free_x") +
  geom_bar(stat = "identity") +
  scale_x_discrete("") +
  scale_y_continuous("Number of Reads") +
  scale_fill_discrete("Treatment") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1)) 
ggplotly(p1)
tmp <- copy(smpl)
tmp$WEEK <- factor(tmp$WEEK,
                    levels = c("baseline",
                               "week1",
                               "week8"),
                    labels = c("Week 0",
                               "Week 1",
                               "Week 8"))
tmp$Genotype <- factor(tmp$Genotype,
                       levels = c("widetype",
                                  "nrf2KO"),
                       labels = c("Wild Type",
                                  "Nrf2 KO"))
p1 <- ggplot(tmp,
             aes(x = SAMPLE_NAME,
                 y = Total,
                 group = TREATMENT,
                 fill = TREATMENT)) +
  facet_wrap(~ Genotype + WEEK,
             scale = "free_x") +
  geom_bar(stat = "identity",
           color = "black") +
  scale_x_discrete("") +
  scale_y_continuous("Number of Reads") +
  scale_fill_grey("Treatment", 
                  start = 0.1, 
                  end = 1,
                  na.value = "red",
                  aesthetics = "fill") +
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        legend.position = "top")

tiff(filename = "tmp/seq_depth.tiff",
     height = 6,
     width = 6,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

print(p1)

8 Richness (Alpha diversity)

Shannon index (aka Shannon enthrophy) is calculated as:
H’ = -sum(1 to R)p(i)ln(p(i)) When there is exactly 1 type of data (e.g. a single species in the sample), H’=0. The opposite scenario is when there are R>1 species present in the sample in the exact same amounts and H’=ln(R).

Shannon’s diversity index was calculated for each sample and ploted over time using the 7,764 from the 13 Phylum above.

shannon.ndx <- estimate_richness(ps0,
                                 measures = "Shannon")

shannon.ndx <- data.table(SAMPLE_NAME = rownames(shannon.ndx),
                          shannon.ndx)

smpl <- merge(smpl,
              shannon.ndx,
              by = "SAMPLE_NAME")

p1 <- ggplot(smpl,
             aes(x = Total,
                 y = Shannon,
                 fill = Genotype,
                 shape = WEEK)) +
  geom_point(size = 2) +
  scale_shape_manual(breaks = unique(smpl$WEEK),
                     values = 21:23)

tiff(filename = "tmp/shannon_vs_depth.tiff",
     height = 5,
     width = 6,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1)

Even though estimate_richness function does not adjust for the sequencing depth, there is no correlation between the index and the sample’s sequecing depth. Proceed with the comparison.

9 Shannon idex over time

p1 <- plot_richness(ps0,
                    x = "time", 
                    measures = "Shannon") +
  facet_wrap(~ genotype) +
  geom_line(aes(group = ID),
            color = "black") +
  geom_point(aes(fill = TREATMENT),
             shape = 21,
             size = 3,
             color = "black") +
  scale_x_discrete("") +
  theme(axis.text.x = element_text(angle = 30,
                                   hjust = 1,
                                   vjust = 1))

ggplotly(p = p1,
         tooltip = c("ID",
                     "value"))

p1 <- p1 + 
  scale_fill_discrete("") +
  theme(legend.position = "top")

tiff(filename = "tmp/shannon.tiff",
     height = 4,
     width = 5,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

The plot above suggests that the largest differences in alpha diversity (as measured by Shannon’s index) are in genotype.

10 Average Shannon Index

# Average shannon index by treatment group
tmp <- copy(smpl)

tmp[, mu := mean(Shannon),
    by = list(TREATMENT,
              Genotype,
              WEEK)]
tmp[, sem := sd(Shannon)/sqrt(.N),
    by = list(TREATMENT,
                 Genotype,
                 WEEK)]
tmp <- unique(tmp[, c("TREATMENT",
                      "Genotype",
                      "WEEK",
                      "mu",
                      "sem")])
tmp$WEEK <- factor(tmp$WEEK,
                   levels = c("baseline",
                              "week1",
                              "week8"),
                   labels = c("Week 0",
                              "Week 1",
                              "Week 8"))
tmp$Genotype <- factor(tmp$Genotype,
                       levels = c("widetype",
                                  "nrf2KO"),
                       labels = c("Wild Type",
                                  "Nrf2 KO"))

p1 <- ggplot(tmp,
             aes(x = WEEK,
                 y = mu,
                 ymin = mu - sem,
                 ymax = mu + sem,
                 fill = TREATMENT,
                 group = TREATMENT)) +
  facet_wrap(~ Genotype) +
  geom_errorbar(position = position_dodge(0.4),
                width = 0.4) +
  geom_line(position = position_dodge(0.4)) +
  geom_point(size = 3,
             shape = 21,
             position = position_dodge(0.4)) +
  scale_x_discrete("") +
  scale_y_continuous("Shannon Index") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1),
        legend.position = "top") 

tiff(filename = "tmp/avg_shannon.tiff",
     height = 5,
     width = 6,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

print(p1)

Test if the richness changed between the baseline and Week 8.

smpl$TREATMENT <- factor(smpl$TREATMENT,
                         levels = c("DSS",
                                    "Naive",
                                    "DSS+PEITC",
                                    "DSS+Cranberry"))
tmp <- droplevels(smpl[WEEK != "week1"])
m1 <- lm(Shannon  ~ WEEK*(TREATMENT + Genotype),
         # offset = Total,
         data = tmp)
summary(m1)

Call:
lm(formula = Shannon ~ WEEK * (TREATMENT + Genotype), data = tmp)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.316186 -0.091027  0.007886  0.110704  0.293230 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       5.94987    0.07064  84.233  < 2e-16 ***
WEEKweek8                         0.01158    0.09989   0.116   0.9084    
TREATMENTNaive                    0.14581    0.08935   1.632   0.1109    
TREATMENTDSS+PEITC               -0.03923    0.08935  -0.439   0.6631    
TREATMENTDSS+Cranberry           -0.22582    0.08935  -2.527   0.0158 *  
Genotypewidetype                 -0.54156    0.06318  -8.572 2.06e-10 ***
WEEKweek8:TREATMENTNaive          0.01181    0.12636   0.093   0.9261    
WEEKweek8:TREATMENTDSS+PEITC      0.01652    0.12636   0.131   0.8966    
WEEKweek8:TREATMENTDSS+Cranberry  0.21535    0.12636   1.704   0.0965 .  
WEEKweek8:Genotypewidetype        0.23085    0.08935   2.584   0.0137 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1548 on 38 degrees of freedom
Multiple R-squared:  0.7845,    Adjusted R-squared:  0.7335 
F-statistic: 15.37 on 9 and 38 DF,  p-value: 3.671e-10
m2 <- lmer(Shannon  ~ WEEK*(TREATMENT + Genotype) + (1 | ID),
           # offset = Total,
           data = tmp)
summary(m2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Shannon ~ WEEK * (TREATMENT + Genotype) + (1 | ID)
   Data: tmp

REML criterion at convergence: -22.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.53721 -0.47495  0.06753  0.44489  1.53874 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 0.01259  0.1122  
 Residual             0.01136  0.1066  
Number of obs: 48, groups:  ID, 24

Fixed effects:
                                 Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)                       5.94987    0.07064 29.76837  84.233  < 2e-16 ***
WEEKweek8                         0.01158    0.06879 19.00000   0.168  0.86814    
TREATMENTNaive                    0.14581    0.08935 29.76837   1.632  0.11322    
TREATMENTDSS+PEITC               -0.03923    0.08935 29.76837  -0.439  0.66379    
TREATMENTDSS+Cranberry           -0.22582    0.08935 29.76837  -2.527  0.01704 *  
Genotypewidetype                 -0.54156    0.06318 29.76837  -8.572 1.55e-09 ***
WEEKweek8:TREATMENTNaive          0.01181    0.08701 19.00000   0.136  0.89350    
WEEKweek8:TREATMENTDSS+PEITC      0.01652    0.08701 19.00000   0.190  0.85139    
WEEKweek8:TREATMENTDSS+Cranberry  0.21535    0.08701 19.00000   2.475  0.02291 *  
WEEKweek8:Genotypewidetype        0.23085    0.06152 19.00000   3.752  0.00135 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
                     (Intr) WEEKw8 TREATMENTN TREATMENTDSS+P TREATMENTDSS+C Gntypw
WEEKweek8            -0.487                                                       
TREATMENTNv          -0.632  0.308                                                
TREATMENTDSS+P       -0.632  0.308  0.500                                         
TREATMENTDSS+C       -0.632  0.308  0.500      0.500                              
Gentypwdtyp          -0.447  0.218  0.000      0.000          0.000               
WEEK8:TREATMENTN      0.308 -0.632 -0.487     -0.243         -0.243          0.000
WEEK8:TREATMENTDSS+P  0.308 -0.632 -0.243     -0.487         -0.243          0.000
WEEK8:TREATMENTDSS+C  0.308 -0.632 -0.243     -0.243         -0.487          0.000
WEEKwk8:Gnt           0.218 -0.447  0.000      0.000          0.000         -0.487
                     WEEK8:TREATMENTN WEEK8:TREATMENTDSS+P WEEK8:TREATMENTDSS+C
WEEKweek8                                                                      
TREATMENTNv                                                                    
TREATMENTDSS+P                                                                 
TREATMENTDSS+C                                                                 
Gentypwdtyp                                                                    
WEEK8:TREATMENTN                                                               
WEEK8:TREATMENTDSS+P  0.500                                                    
WEEK8:TREATMENTDSS+C  0.500            0.500                                   
WEEKwk8:Gnt           0.000            0.000                0.000              

11 Calculate change in Shannon index from baseline

dd <- smpl
dd[, delta := Shannon - Shannon[WEEK == "baseline"],
   by = ID]
dd$diff <- paste(dd$WEEK,
                 "-baseline",
                 sep = "")
dd <- dd[WEEK != "baseline",]
p1 <- ggplot(dd,
             aes(x = TREATMENT,
                 y = delta,
                 fill = Genotype)) +
  facet_wrap(~ diff) +
  geom_hline(yintercept = 0,
             linetype = "dashed") +
  geom_point(position = position_dodge(0.3),
             shape = 21,
             size = 3) +
  scale_y_continuous("Shannon Index Percent Change from Baseline") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1))
print(p1)

dd$TREATMENT <- factor(dd$TREATMENT,
                        levels = c("DSS",
                                   "Naive",
                                   "DSS+PEITC",
                                   "DSS+Cranberry"))
dd$Genotype <- factor(dd$Genotype,
                       levels = c("widetype",
                                  "nrf2KO"))
m1 <- lm(delta ~ TREATMENT*Genotype,
         data = dd)
summary(m1)

Call:
lm(formula = delta ~ TREATMENT * Genotype, data = dd)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40513 -0.09560 -0.02012  0.09568  0.35517 

Coefficients:
                                      Estimate Std. Error t value Pr(>|t|)   
(Intercept)                            0.25142    0.07286   3.451  0.00133 **
TREATMENTNaive                        -0.04426    0.10303  -0.430  0.66985   
TREATMENTDSS+PEITC                    -0.15777    0.10303  -1.531  0.13358   
TREATMENTDSS+Cranberry                 0.04463    0.10303   0.433  0.66720   
Genotypenrf2KO                        -0.18412    0.10303  -1.787  0.08153 . 
TREATMENTNaive:Genotypenrf2KO         -0.02851    0.14571  -0.196  0.84586   
TREATMENTDSS+PEITC:Genotypenrf2KO      0.24747    0.14571   1.698  0.09721 . 
TREATMENTDSS+Cranberry:Genotypenrf2KO  0.07927    0.14571   0.544  0.58946   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1785 on 40 degrees of freedom
Multiple R-squared:  0.249, Adjusted R-squared:  0.1176 
F-statistic: 1.894 on 7 and 40 DF,  p-value: 0.09608
# No significant interactions, proceed with 2-way analysis
m2 <- lm(delta ~ TREATMENT + Genotype,
         data = dd)
summary(m2)

Call:
lm(formula = delta ~ TREATMENT + Genotype, data = dd)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49158 -0.09742 -0.01290  0.11101  0.35281 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)             0.21414    0.05849   3.661 0.000683 ***
TREATMENTNaive         -0.05851    0.07399  -0.791 0.433377    
TREATMENTDSS+PEITC     -0.03404    0.07399  -0.460 0.647813    
TREATMENTDSS+Cranberry  0.08427    0.07399   1.139 0.261014    
Genotypenrf2KO         -0.10956    0.05232  -2.094 0.042176 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1812 on 43 degrees of freedom
Multiple R-squared:  0.1674,    Adjusted R-squared:  0.08999 
F-statistic: 2.162 on 4 and 43 DF,  p-value: 0.08947

At Week 8 there was significantly smaller increase of alpha diversity from baseline in Nrf2 KO compared to WT, and in DSS+Cranberry compared to DSS only.

12 Load aminoacids

aa <- fread("data_sep2019/sep2019_aminoacids.csv")
aa <- aa[!is.na(ID), ]
aa$ID <- paste0(aa$ID,
                aa$CAGE)
smpl1 <- unique(smpl[, c("ID",
                            "TREATMENT",
                            "Genotype")])
smpl1$ID <- as.character(smpl1$ID)
aa <- merge(smpl1,
            aa,
            by = "ID")
aa[, trt_week := paste(TREATMENT,
                       WEEK,
                       sep = "_")]
aa$trt_week <- factor(aa$trt_week,
                      levels = c("Naive_week2",
                                 "Naive_week6",
                                 "DSS_week2",
                                 "DSS_week6",
                                 "DSS+Cranberry_week2",
                                 "DSS+Cranberry_week6" ,
                                 "DSS+PEITC_week2",
                                 "DSS+PEITC_week6"))

13 Aminoacids by animal, treatment group and timepoint

for (i in 8:(ncol(aa) - 1)) {
  tmp <- aa[, c(1, 3, 7, ncol(aa), i), with = FALSE]
  colnames(tmp)[5] <- "Y"
  p1 <- ggplot(tmp,
               aes(x = trt_week,
                   y = Y,
                   fill = Genotype,
                   group = ID)) +
    geom_line(position = position_dodge(0.3)) +
    geom_point(shape = 21,
               size = 3,
               position = position_dodge(0.3)) +
    scale_x_discrete("") +
    scale_y_continuous(colnames(aa)[i]) +
    theme(axis.text.x = element_text(angle = 45,
                                     hjust = 1))
  # tiff(filename = paste0("tmp/",
  #                        colnames(aa)[i],
  #                        ".tiff"),
  #      height = 4,
  #      width = 5,
  #      units = "in",
  #      res = 600,
  #      compression = "lzw+p")
  # print(p1)
  # graphics.off()
  
  print(p1)
}

14 Aminoacids over time, group averages

out <- list()
for (i in 8:27) {
  tmp <- aa[, c(2, 3, 7, i), 
            with = FALSE]
  names(tmp)[4] <- "val"
  
  tmp[, mu := mean(val,
                   na.rm = TRUE),
      by = list(TREATMENT,
                Genotype,
                WEEK)]
  tmp[, sem := sd(val,
                  na.rm = TRUE)/sqrt(.N),
      by = list(TREATMENT,
                Genotype,
                WEEK)]
  out[[i - 7]] <- data.table(Aminoacid = names(aa)[i],
                             unique(tmp[, c("TREATMENT",
                                            "Genotype",
                                            "WEEK",
                                            "mu",
                                            "sem")]))
}
muaa <- rbindlist(out)
muaa$Aminoacid <- factor(muaa$Aminoacid,
                         levels = unique(muaa$Aminoacid))
muaa$Genotype <- factor(muaa$Genotype,
                        levels = c("widetype",
                                   "nrf2KO"),
                        labels = c("Wild Type",
                                   "Nrf2 KO"))
for (i in 1:nlevels(muaa$Aminoacid)) {
  p1 <- ggplot(muaa[Aminoacid == levels(muaa$Aminoacid)[i], ],
               aes(x = WEEK,
                   y = mu,
                   ymin = mu - sem,
                   ymax = mu + sem,
                   fill = TREATMENT,
                   group = TREATMENT)) +
    facet_wrap(~ Genotype) +
    geom_errorbar(position = position_dodge(0.4),
                  width = 0.4) +
    geom_line(position = position_dodge(0.4)) +
    geom_point(size = 3,
               shape = 21,
               position = position_dodge(0.4)) +
    scale_x_discrete("",
                     breaks = c("week2",
                                "week6"),
                     labels = c("Week 2",
                                "Week 6")) +
    scale_y_continuous(levels(muaa$Aminoacid)[i]) +
    theme(axis.text.x = element_text(angle = 45,
                                     hjust = 1),
          legend.position = "top") 
  
  # tiff(filename = paste0("tmp/avg_",
  #                        levels(muaa$Aminoacid)[i],
  #                        ".tiff"),
  #      height = 5,
  #      width = 6,
  #      units = "in",
  #      res = 600,
  #      compression = "lzw+p")
  # print(p1)
  # graphics.off()
  
  print(p1)
}

15 Aminoacid data PCA

dt_pca <- aa[, Alanine:glutamine]
# m1 <- prcomp(dt_pca,
#              center = TRUE,
#              scale. = TRUE)
# m1 <- prcomp(dt_pca,
#              center = FALSE,
#              scale. = FALSE)
m1 <- prcomp(dt_pca)
summary(m1)
Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7     PC8
Standard deviation     4.8108 2.5833 1.78182 1.13495 0.93367 0.84949 0.62035 0.49052
Proportion of Variance 0.6276 0.1810 0.08609 0.03493 0.02364 0.01957 0.01044 0.00652
Cumulative Proportion  0.6276 0.8086 0.89466 0.92959 0.95323 0.97280 0.98323 0.98976
                           PC9    PC10    PC11    PC12   PC13    PC14    PC15    PC16
Standard deviation     0.35960 0.26786 0.22940 0.20322 0.1486 0.13250 0.11932 0.11154
Proportion of Variance 0.00351 0.00195 0.00143 0.00112 0.0006 0.00048 0.00039 0.00034
Cumulative Proportion  0.99326 0.99521 0.99663 0.99775 0.9983 0.99883 0.99922 0.99955
                          PC17    PC18    PC19    PC20
Standard deviation     0.08896 0.07274 0.05110 0.02565
Proportion of Variance 0.00021 0.00014 0.00007 0.00002
Cumulative Proportion  0.99977 0.99991 0.99998 1.00000
# Select PC-s to pliot (PC1 & PC2)
choices <- 1:2
# Scores, i.e. points (df.u)
dt.scr <- data.table(m1$x[, choices])
# Add grouping variable
dt.scr$grp <- aa$trt_week
dt.scr$TREATMENT <- aa$TREATMENT
dt.scr$WEEK <- aa$WEEK
dt.scr
# Loadings, i.e. arrows (df.v)
dt.rot <- as.data.frame(m1$rotation[, choices])
dt.rot$feat <- rownames(dt.rot)
dt.rot <- data.table(dt.rot)
dt.rot
dt.load <- melt.data.table(dt.rot,
                           id.vars = "feat",
                           measure.vars = 1:2,
                           variable.name = "pc",
                           value.name = "loading")
dt.load$feat <- factor(dt.load$feat,
                       levels = unique(dt.load$feat))
# Plot loadings
p0 <- ggplot(data = dt.load,
             aes(x = feat,
                 y = loading)) +
  facet_wrap(~ pc,
             nrow = 2) +
  geom_bar(stat = "identity") +
  ggtitle("PC Loadings") +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))
tiff(filename = "tmp/pc.1.2_loadings.tiff",
     height = 5,
     width = 8,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p0)
graphics.off()
print(p0)

# Axis labels
u.axis.labs <- paste(colnames(dt.rot)[1:2], 
                     sprintf('(%0.1f%% explained var.)', 
                             100*m1$sdev[choices]^2/sum(m1$sdev^2)))
u.axis.labs
[1] "PC1 (62.8% explained var.)" "PC2 (18.1% explained var.)"
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = grp),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 10*PC1,
                   yend = 10*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 11*PC1,
                y = 11*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Aminoacids") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/aminoacids_biplot.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1)
p2 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = WEEK),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 10*PC1,
                   yend = 10*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               size = 1.2, 
               color = "black") +
  geom_text(aes(x = 11*PC1,
                y = 11*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Week") +
  ggtitle("Biplot of Aminoacids") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/aminoacids_by_week_biplot.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p2)
graphics.off()
ggplotly(p2)
p2 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = TREATMENT),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 10*PC1,
                   yend = 10*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               size = 1.2, 
               color = "black") +
  geom_text(aes(x = 11*PC1,
                y = 11*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Treatment") +
  ggtitle("Biplot of Aminoacids") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/aminoacids_by_trt_biplot.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p2)
graphics.off()
ggplotly(p2)

16 Remove unmapped OTUs

The 311 unmapped OTUs were removed from further analysis (with 7,247 OTUs left).

ps1 <- subset_taxa(ps0, 
                   !is.na(Phylum))
dim(ps1@otu_table@.Data)
[1]   72 7247

17 Counts at Phylum level

18 Relative abundance (%) at Phylum level

Remove phyla with relative abundance of >= 1% in less than 10% of samples.

t1 <- data.table(Phylum = ra_p$Phylum,
                 `Number of Samples` = rowSums(ra_p[, 2:ncol(ra_p)] >= 0.01))
t1$`Percent Samples` <-  t1$`Number of Samples`/72
setorder(t1, -`Number of Samples`)
datatable(t1,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(t1))) %>%
  formatPercentage(columns = 3,
                   digits = 1)

We will remove Chlamydiae from this analysis.

[1] "Firmicutes, Bacteroidetes, Proteobacteria, Verrucomicrobia, Epsilonbacteraeota, Actinobacteria, Deferribacteres, Patescibacteria, Cyanobacteria"

7,224 OTUs, down from 7,247 OTUs in the previous table.

19 Relative Abundance in Samples at Different Taxonomic Ranks

19.1 1. Class

20 PCA at Class level

dt_pca <- t(ra_c[, 3:ncol(ra_c)])
colnames(dt_pca) <- ra_c$Class
dt_pca_c <- data.table(SAMPLE_NAME = rownames(dt_pca),
                       dt_pca)
dt_pca_c <- merge(smpl,
                  dt_pca_c,
                  by = "SAMPLE_NAME")
# m1 <- prcomp(dt_pca,
#              center = TRUE,
#              scale. = TRUE)
# m1 <- prcomp(dt_pca,
#              center = FALSE,
#              scale. = FALSE)
m1 <- prcomp(dt_pca)
summary(m1)
Importance of components:
                           PC1    PC2    PC3     PC4     PC5     PC6     PC7     PC8
Standard deviation     14.6908 8.5044 7.5927 5.76824 3.41220 2.73038 2.43631 1.46850
Proportion of Variance  0.5294 0.1774 0.1414 0.08162 0.02856 0.01829 0.01456 0.00529
Cumulative Proportion   0.5294 0.7068 0.8482 0.92987 0.95844 0.97672 0.99128 0.99657
                           PC9    PC10    PC11    PC12    PC13      PC14
Standard deviation     0.93207 0.55762 0.43780 0.15826 0.02399 8.187e-17
Proportion of Variance 0.00213 0.00076 0.00047 0.00006 0.00000 0.000e+00
Cumulative Proportion  0.99870 0.99947 0.99994 1.00000 1.00000 1.000e+00
# Select PC-s to pliot (PC1 & PC2)
choices <- 1:2
# Scores, i.e. points (df.u)
dt.scr <- data.table(m1$x[, choices])
# Add grouping variable
dt.scr$grp <- paste(dt_pca_c$TREATMENT,
                    dt_pca_c$WEEK,
                    dt_pca_c$Genotype)
dt.scr$TREATMENT <- dt_pca_c$TREATMENT
dt.scr$WEEK <- dt_pca_c$WEEK
dt.scr$Genotype <- dt_pca_c$Genotype
dt.scr
# Loadings, i.e. arrows (df.v)
dt.rot <- as.data.frame(m1$rotation[, choices])
dt.rot$feat <- rownames(dt.rot)
dt.rot <- data.table(dt.rot)
dt.rot
dt.load <- melt.data.table(dt.rot,
                           id.vars = "feat",
                           measure.vars = 1:2,
                           variable.name = "pc",
                           value.name = "loading")
dt.load$feat <- factor(dt.load$feat,
                       levels = unique(dt.load$feat))
# Plot loadings
p0 <- ggplot(data = dt.load,
             aes(x = feat,
                 y = loading)) +
  facet_wrap(~ pc,
             nrow = 2) +
  geom_bar(stat = "identity") +
  ggtitle("PC Loadings") +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))
tiff(filename = "tmp/pc.1.2_loadings_class.tiff",
     height = 5,
     width = 8,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p0)
graphics.off()
print(p0)

# Axis labels
u.axis.labs <- paste(colnames(dt.rot)[1:2], 
                     sprintf('(%0.1f%% explained var.)', 
                             100*m1$sdev[choices]^2/sum(m1$sdev^2)))
u.axis.labs
[1] "PC1 (52.9% explained var.)" "PC2 (17.7% explained var.)"
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = grp),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Classes of Bacteria") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20)) 
tiff(filename = "tmp/class_biplot_grp.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1)
# Find centers of each group
grpg <- "grp"
var1 <- eval(parse(text = paste("dt.scr$",
                                grpg,
                                sep = "")))
cntr <- data.table(Group = unique(var1),
                   PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(var1),
                                   FUN = "mean")$x,
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = Group,
                          colour = Group),
                      alpha = 0.5,
                      size = 3) +
  scale_color_discrete(guide = FALSE) +
  theme(legend.position = "none")
print(p2)

# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Classes of Bacteria") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/class_biplot_genotype.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1)
# Find centers of each group
grpg <- "Genotype"
var1 <- eval(parse(text = paste("dt.scr$",
                                grpg,
                                sep = "")))
cntr <- data.table(Group = unique(var1),
                   PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(var1),
                                   FUN = "mean")$x,
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = Group,
                          colour = Group),
                      alpha = 0.5,
                      size = 3) +
  scale_color_discrete(guide = FALSE) +
  theme(legend.position = "none")
print(p2)

# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = WEEK),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Classes of Bacteria") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/class_biplot_week.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1)
# Find centers of each group
grpg <- "WEEK"
var1 <- eval(parse(text = paste("dt.scr$",
                                grpg,
                                sep = "")))
cntr <- data.table(Group = unique(var1),
                   PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(var1),
                                   FUN = "mean")$x,
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = Group,
                          colour = Group),
                      alpha = 0.5,
                      size = 3) +
  scale_color_discrete(guide = FALSE) +
  theme(legend.position = "none")
print(p2)

# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = TREATMENT),
             shape = 21,
             size = 3,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_manual(name = "Treatment",
                    breaks = levels(dt.scr$TREATMENT),
                                    values = c("red",
                                               "green",
                                               "orange",
                                               "blue")) +
                      ggtitle("Biplot of Classes of Bacteria") +
                      theme(plot.title = element_text(hjust = 0.5,
                                                      size = 20))
                    ggplotly(p1)
# Find centers of each group
# grpg <- "TREATMENT"
# var1 <- eval(parse(text = paste("dt.scr$",
#                                 grpg,
#                                 sep = "")))
# cntr <- data.table(Group = levels(var1),
#                    PC1 = aggregate(x = dt.scr$PC1,
#                                    by = list(var1),
#                                    FUN = "mean")$x,
#                    PC2 = aggregate(x = dt.scr$PC2,
#                                    by = list(var1),
#                                    FUN = "mean")$x)
cntr <- data.table(PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(dt.scr$TREATMENT),
                                   FUN = "mean"),
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
colnames(cntr) <- c("TREATMENT",
                    "PC1",
                    "PC2")
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = TREATMENT,
                          colour = TREATMENT),
                      alpha = 0.5,
                      size = 3) +
  scale_color_manual(guide = FALSE,
                    breaks = levels(cntr$TREATMENT),
                                    values = c("red",
                                               "green",
                                               "orange",
                                               "blue")) +
  theme(legend.position = "none")
tiff(filename = "tmp/class_biplot_trt.tiff",
     height = 8,
     width = 8,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p2)
graphics.off()
print(p2)

# Covariates only
m1 <- multinom(TREATMENT ~ WEEK + Genotype,
               data = dt.scr)
# weights:  20 (12 variable)
initial  value 99.813194 
final  value 99.813194 
converged
summary(m1)
Call:
multinom(formula = TREATMENT ~ WEEK + Genotype, data = dt.scr)

Coefficients:
              (Intercept) WEEKweek1 WEEKweek8 Genotypewidetype
Naive                   0         0         0                0
DSS+PEITC               0         0         0                0
DSS+Cranberry           0         0         0                0

Std. Errors:
              (Intercept) WEEKweek1 WEEKweek8 Genotypewidetype
Naive           0.6666667 0.8164966 0.8164966        0.6666667
DSS+PEITC       0.6666667 0.8164966 0.8164966        0.6666667
DSS+Cranberry   0.6666667 0.8164966 0.8164966        0.6666667

Residual Deviance: 199.6264 
AIC: 223.6264 
prd1 <- predict(m1)
t1 <- table(Predicted = prd1,
            Observed = dt.scr$TREATMENT)
# PC1 alone
m2 <- multinom(TREATMENT ~ PC1,
               data = dt.scr)
# weights:  12 (6 variable)
initial  value 99.813194 
iter  10 value 84.652280
final  value 84.650617 
converged
summary(m2)
Call:
multinom(formula = TREATMENT ~ PC1, data = dt.scr)

Coefficients:
              (Intercept)         PC1
Naive         -0.84291965 -0.13135533
DSS+PEITC      0.02563488 -0.01778110
DSS+Cranberry -0.28377674  0.04267039

Std. Errors:
              (Intercept)        PC1
Naive           0.5329226 0.03966130
DSS+PEITC       0.3375675 0.02567473
DSS+Cranberry   0.3902129 0.02668075

Residual Deviance: 169.3012 
AIC: 181.3012 
prd2 <- predict(m2)
t2 <- table(Predicted = prd2,
            Observed = dt.scr$TREATMENT)
# PC1 with covariates
m3 <- multinom(TREATMENT ~ PC1 + WEEK + Genotype,
               data = dt.scr)
# weights:  24 (15 variable)
initial  value 99.813194 
iter  10 value 78.199054
final  value 77.966110 
converged
summary(m3)
Call:
multinom(formula = TREATMENT ~ PC1 + WEEK + Genotype, data = dt.scr)

Coefficients:
              (Intercept)         PC1   WEEKweek1  WEEKweek8 Genotypewidetype
Naive         -1.69181404 -0.16561286 -0.19417414 -0.2467395        2.0439498
DSS+PEITC     -0.22775882 -0.03756158 -0.03426154 -0.1624952        0.6998366
DSS+Cranberry  0.04994931  0.11984007  0.18496264  0.6354061       -2.2783047

Std. Errors:
              (Intercept)        PC1 WEEKweek1 WEEKweek8 Genotypewidetype
Naive           0.9208565 0.04798080 1.0184358 0.9866132        1.0821427
DSS+PEITC       0.7006164 0.03813898 0.8248932 0.8419015        0.9838589
DSS+Cranberry   0.7578087 0.04894181 0.8842508 0.9154535        1.1724461

Residual Deviance: 155.9322 
AIC: 185.9322 
prd3 <- predict(m3)
t3 <- table(Predicted = prd3,
            Observed = dt.scr$TREATMENT)
# PC1 + PC2 with covariates
m4 <- multinom(TREATMENT ~ PC1 + PC2 + WEEK + Genotype,
               data = dt.scr)
# weights:  28 (18 variable)
initial  value 99.813194 
iter  10 value 77.643401
iter  20 value 76.157664
final  value 76.157652 
converged
summary(m4)
Call:
multinom(formula = TREATMENT ~ PC1 + PC2 + WEEK + Genotype, data = dt.scr)

Coefficients:
              (Intercept)         PC1         PC2  WEEKweek1   WEEKweek8 Genotypewidetype
Naive          -1.6723699 -0.16055402  0.02692425  0.0603261 -0.04440483        1.7116558
DSS+PEITC      -0.1610640 -0.03200954 -0.04731926 -0.3059877 -0.35846181        0.6645649
DSS+Cranberry  -0.1487237  0.11347301  0.04018812  0.3641810  0.72081476       -2.0106669

Std. Errors:
              (Intercept)        PC1        PC2 WEEKweek1 WEEKweek8 Genotypewidetype
Naive           0.9329651 0.04827333 0.04795722 1.1064244 1.0330649         1.083235
DSS+PEITC       0.7176961 0.03961699 0.04860606 0.8745219 0.8762840         1.022935
DSS+Cranberry   0.8377326 0.04936837 0.06510574 0.9608816 0.9434467         1.204373

Residual Deviance: 152.3153 
AIC: 188.3153 
prd4 <- predict(m4)
t4 <- table(Predicted = prd4,
            Observed = dt.scr$TREATMENT)
# Confusion tables
datatable(cbind(t1), 
          caption = "Covariates Only")

datatable(cbind(t2),
          caption = "PC1 Only")

datatable(cbind(t3),
          caption = "PC1 with covariates")

datatable(cbind(t4),
          caption = "PC1 + PC2 with covariates")

# Compare models
anova(m1, m3)
Likelihood ratio tests of Multinomial Models

Response: TREATMENT
                  Model Resid. df Resid. Dev   Test    Df LR stat.      Pr(Chi)
1       WEEK + Genotype       204   199.6264                                   
2 PC1 + WEEK + Genotype       201   155.9322 1 vs 2     3 43.69417 1.752679e-09
anova(m2, m3)
Likelihood ratio tests of Multinomial Models

Response: TREATMENT
                  Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
1                   PC1       210   169.3012                                
2 PC1 + WEEK + Genotype       201   155.9322 1 vs 2     9 13.36901 0.1466072
anova(m4, m3)
Likelihood ratio tests of Multinomial Models

Response: TREATMENT
                        Model Resid. df Resid. Dev   Test    Df LR stat.  Pr(Chi)
1       PC1 + WEEK + Genotype       201   155.9322                               
2 PC1 + PC2 + WEEK + Genotype       198   152.3153 1 vs 2     3 3.616917 0.305912

The results suggest that:
a. Covariates alone (timepoint and genotype) cannot explain the difference between treatments.
b. Principal Component 1 (PC1) can explain the differences in relative abundance of classes in the samples. The model does not improve significantly by adding the covariates or the PC2. However, the covariates should stay in the model for adjustment, and PC2 slightly improves the predictions.
c. The full model (PC1 + PC2 + Week + Genotype) correctly classifies 12 out of 18 Naive samples, and 12 out of 18 DSS+Cranburry samples.

Continuing the same analysis at Order, Family and Genus levels.

DO ROC AUC NEST!!!

# # Output probobilities----
# prd1.1 <- predict(m1,
#                   type = "probs")
# prd1.1 <- data.table(ID = df.u$ID,
#                      Treatment = df.u$Treatment,
#                      round(prd1.1,
#                            4))
# 
# prd2.1 <- predict(m2,
#                   type = "probs")
# prd2.1 <- data.table(ID = df.u$ID,
#                      Treatment = df.u$Treatment,
#                      round(prd2.1,
#                            4))
# 
# # Sensitivity/Specificity
# # Tresholds
# trhd <- seq(0, 1, by = 0.01)
# 
# out1 <- list()
# for (i in 1:length(trhd)) {
#   tmp <- apply(prd1.1[, -c(1:2)],
#                MARGIN = 2,
#                FUN = function(a) {
#                  return(a >= trhd[i])
#                })
#   tmp2 <- apply(X = tmp,
#                 MARGIN = 2,
#                 FUN = function(a) {
#                   aggregate(x = a,
#                             by = list(prd1.1$Treatment),
#                             FUN = sum)$x
#                 })
#   tmp2
#   out1[[i]] <- c(sens = sum(diag(tmp2))/nrow(prd1.1),
#                  spec = (sum(tmp2[upper.tri(tmp2)]) + 
#                            sum(tmp2[lower.tri(tmp2)]))/(nrow(tmp)*(ncol(tmp) - 1)))
# }
# out1 <- data.table(do.call("rbind", out1))
# out1 <- unique(out1)
# 
# out2 <- list()
# for (i in 1:length(trhd)) {
#   tmp <- apply(prd2.1[, -c(1:2)],
#                MARGIN = 2,
#                FUN = function(a) {
#                  return(a >= trhd[i])
#                })
#   tmp2 <- apply(X = tmp,
#                 MARGIN = 2,
#                 FUN = function(a) {
#                   aggregate(x = a,
#                             by = list(prd2.1$Treatment),
#                             FUN = sum)$x
#                 })
#   out2[[i]] <- c(sens = sum(diag(tmp2))/nrow(prd2.1),
#                  spec = (sum(tmp2[upper.tri(tmp2)]) + 
#                            sum(tmp2[lower.tri(tmp2)]))/(nrow(tmp)*(ncol(tmp) - 1)))
# }
# out2 <- data.table(do.call("rbind", out2))
# out2 <- unique(out2)
# 
# # ROC
# roc1 <- auc(x = out1$spec,
#             y = out1$sens,
#             from = 0,
#             to = 1)
# 
# roc2 <- auc(x = out2$spec,
#             y = out2$sens,
#             from = 0,
#             to = 1)
# 
# # ROC plot
# plot(out1$sens ~ out1$spec,
#      type = "l",
#      xlim = c(0, 1),
#      ylim = c(0, 1),
#      xlab = "1 - Specificity",
#      ylab = "Sensitivity",
#      col = "blue")
# lines(out2$sens ~ out2$spec,
#       col = "red")
# text(x = c(0.8, 0.8),
#      y = c(0.2, 0.3),
#      label = c(paste("ROC(PC1) = ",
#                      round(roc1, 
#                            3)),
#                paste("\nROC(PC1+PC2) = ",
#                      round(roc2,
#                            3))),
#      col = c("blue",
#              "red"))
# abline(0, 1, lty = 2)
mu$Trt_Genotype <- factor(paste(mu$Treatment,
                          mu$Genotype,
                          sep = "_"))
p0 <- ggplot(mu,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Class,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))
tiff(filename = "tmp/wt_class_over_time.tiff",
     height = 5,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()
print(p0)

p1 <- ggplot(mu,
             aes(x = x,
                 y = Class,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)") 
tiff(filename = "tmp/wt_class_ra.tiff",
     height = 4,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()
ggplotly(p1+
  theme(legend.position = "none"))

20.1 2. Order

mu$Trt_Genotype <- factor(paste(mu$Treatment,
                                mu$Genotype,
                                sep = "_"))

p0 <- ggplot(mu,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Order,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_Order_over_time.tiff",
     height = 5,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
p1 <- ggplot(mu,
             aes(x = x,
                 y = Order,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)")

tiff(filename = "tmp/wt_Order_ra.tiff",
     height = 4,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1 +
           theme(legend.position = "none"))

20.2 3. Family

NOTE: only the first 24 families had large enough counts - ploting only them.

mu$Trt_Genotype <- factor(paste(mu$Treatment,
                                mu$Genotype,
                                sep = "_"))
mu1 <- droplevels(mu[Family %in% levels(mu$Family)[nlevels(mu$Family):(nlevels(mu$Family) - 24)], ])

p0 <- ggplot(mu1,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Family,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_Family_over_time.tiff",
     height = 7,
     width = 9,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
p1 <- ggplot(mu1,
             aes(x = x,
                 y = Family,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)") +
  theme(legend.position = "top")

tiff(filename = "tmp/wt_Family_ra.tiff",
     height = 4,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1+
           theme(legend.position = "none"))

20.3 4. Genus

mu$Trt_Genotype <- factor(paste(mu$Treatment,
                                mu$Genotype,
                                sep = "_"))
mu1 <- droplevels(mu[Genus %in% levels(mu$Genus)[nlevels(mu$Genus):(nlevels(mu$Genus) - 35)], ])

p0 <- ggplot(mu1,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Genus,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_Genus_over_time.tiff",
     height = 9,
     width = 12,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0+
           theme(legend.position = "none"))
p1 <- ggplot(mu1,
             aes(x = x,
                 y = Genus,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)") +
  theme(legend.position = "top")

tiff(filename = "tmp/wt_Genus_ra.tiff",
     height = 9,
     width = 9,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1+
           theme(legend.position = "none"))

21 Session Information

sessionInfo()
---
title: "Data Visualization of WT and Nrf2 KO (-/-) BL6 PEITC or Cranberry Treated Mice 16S Microbiome Data Analysis, September 2019 Batch"
output: 
  html_notebook:
    toc: yes
    toc_float: yes
    number_sections: yes
    code_folding: hide
---
Date: `r date()`     
Scientist: [Ran Yin](mailto:ry147@scarletmail.rutgers.edu)      
Sequencing (Waksman): [Dibyendu Kumar](mailto:dk@waksman.rutgers.edu)      
Statistics: [Davit Sargsyan](mailto:sargdavid@gmail.com)      
Principal Investigator: [Ah-Ng Kong](mailto:kongt@pharmacy.rutgers.edu) 

```{}
# Taxonomic Ranks:
# **K**ing **P**hillip **C**an n**O**t **F**ind **G**reen **S**ocks
# * Kingdom                
# * Phylum                    
# * Class                   
# * Order                   
# * Family     
# * Genus     
# * Species  
```

```{r setup}
# options(stringsAsFactors = FALSE,
#         scipen = 999)

# # Increase mmemory size to 64 Gb----
# invisible(utils::memory.limit(65536))


# str(knitr::opts_chunk$get())
# # NOTE: the below does not work!
# knitr::opts_chunk$set(echo = FALSE, 
#                       message = FALSE,
#                       warning = FALSE,
#                       error = FALSE)

# require(knitr)
# require(kableExtra)
# require(shiny)

require(phyloseq)
require(data.table)
require(ggplot2)
require(plotly)
require(DT)
require(lmerTest)
require(nnet)

source("source/functions_may2019.R")

# On Windows set multithread=FALSE----
mt <- TRUE
```

# Introduction
C57BL/6 wild-type (WT) and Nrf-2 double-knock-out (KO -/-) mice were given 2-week microbiome stabilization process using AIN93M diet and 8 more weeks to treat with either AIN93M or AIN93M 5% PEITC diet. Fecal samples were collected weekly, immediately frozen in liquid nitrogen and stored at -80^o^C. Serum, cecal, colon epithelial and whole colon tissues at week 10 were also collected for further analyses. Baseline, week 1 and 4 fecal samples were selected for 16s rRNA sequencing.  
  
This document examines results from the WT mice samples.  
  
We will attampt to answer the following questions:  
1. Did microbiome change over time?  
2. Was microbiome affected by diet?  
3. Was there a difference between the KO and WT?  
4. If there was a change in microbiome composition, what functional changes did it carry? What are the essential functions of the bacteria affected by the treatment and how can this be shown in vivo (metabolites, inflammation markers, etc.)?

# Data preprocessing
## Raw Data 
FastQ files were downloaded from [this Rutgers Box location](https://rutgers.app.box.com/folder/90143462291). A total of 144 files (2 per sample, pair-ended) and a pair of undetermined reads were downloaded. 

## Script
This script (***nrf2ubiome_dada2_sep2019_v1.Rmd***) was developed using [DADA2 Pipeline Tutorial (1.12)](https://benjjneb.github.io/dada2/tutorial.html) with tips and tricks from the [University of Maryland Shool of Medicine Institute for Genome Sciences (IGS)](http://www.igs.umaryland.edu/) [Microbiome Analysis Workshop (April 8-11, 2019)](http://www.igs.umaryland.edu/education/wkshp_metagenome.php). The output of the DADA2 script (***data_may2019/ps_sep2019.RData***) is explored in this document.

# Meta data: sample description
```{r data}
# Load data----
# Counts
load("data_sep2019/ps_sep2019.RData")

# Taxonomy
load("data_sep2019/taxa.RData")
taxa <- data.table(seq16s = rownames(taxa),
                   taxa)
```

**NOTE: correction to the meta-data!** (11/15/2019)
```{r correct_meta_data}
correct_samples <- fread("data_sep2019/16s metadata Sep-2019.csv")
ps_sep2019@sam_data$DSS <- correct_samples$DSS
```

# Samples
```{r samples}
ps_sep2019@sam_data$Genotype_Week <- paste(ps_sep2019@sam_data$genotype,
                                           ps_sep2019@sam_data$time,
                                           sep = "_")
ps_sep2019@sam_data$ID <- factor(paste0(ps_sep2019@sam_data$mice_num,
                                        ps_sep2019@sam_data$cage))

ps_sep2019@sam_data$TREATMENT <- paste0(ps_sep2019@sam_data$DSS,
                                        ps_sep2019@sam_data$PEITC,
                                        ps_sep2019@sam_data$cranberry)
ps_sep2019@sam_data$TREATMENT <- factor(ps_sep2019@sam_data$TREATMENT,
                                        levels = c("000",
                                                   "100",
                                                   "110",
                                                   "101"),
                                        labels = c("Naive",
                                                   "DSS",
                                                   "DSS+PEITC",
                                                   "DSS+Cranberry"))

samples <- ps_sep2019@sam_data
datatable(samples,
          options = list(pageLength = nrow(samples)))
```

# Prune data
The OTUs were mapped to Bacteria (96.07%), Eukaryota (2.95%) and Archea (0.03%) kingdoms, and  75 OTUs (0.95%) undefined. 

```{r check_mapping_kingdom, warning = FALSE, echo = FALSE, message = FALSE}
t1 <- data.table(table(tax_table(ps_sep2019)[, "Kingdom"],
                       exclude = NULL))
t1$V1[is.na(t1$V1)] <- "Unknown"

t1[, pct := N/sum(N)]
setorder(t1, -N)

colnames(t1) <- c("Kingdom",
                  "Number of OTUs",
                  "Percent of OTUs")
datatable(t1,
          rownames = FALSE,
          caption = "Number of OTUs by Kingdom",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(t1))) %>%
  formatCurrency(columns = 2,
                 currency = "",
                 mark = ",",
                 digits = 0) %>%
  formatPercentage(columns = 3,
                   digits = 2)
```

The total of 7,867 unique sequences were found. Out of those, 7,558 were mapped to bacterial genomes. 

```{r keep_bacteria}
dim(ps_sep2019@otu_table@.Data)

# Remove OTU not mapped to Bacteria
ps0 <- subset_taxa(ps_sep2019, 
                   Kingdom == "Bacteria")
dim(ps0@otu_table@.Data)
```
  
Out of the 7,558 OTUs 7,247 belonged to 12 Phyla. 311 of the OTUs (or 4.11% of bacterial OTUs) could not be mapped to a phylum.

```{r phylum_mapping}
t2 <- data.table(table(tax_table(ps0)[, "Phylum"],
                                  exclude = NULL))
t2$V1[is.na(t2$V1)] <- "Unknown"
setorder(t2, -N)
t2[, pct := N/sum(N)]
setorder(t2, -N)

colnames(t2) <- c("Phylum",
                  "Number of OTUs",
                  "Percent of OTUs")

datatable(t2,
          rownames = FALSE,
          caption = "Number of Bacterial OTUs by Phylum",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(t2))) %>%
  formatCurrency(columns = 2,
                 currency = "",
                 mark = ",",
                 digits = 0) %>%
  formatPercentage(columns = 3,
                   digits = 2)
```

# OTU table (first 10 rows)
```{r otu_table, warning=FALSE,echo=FALSE,message=FALSE}
otu <- data.table(ps0@tax_table@.Data,
                  t(ps0@otu_table@.Data))
datatable(head(otu, 10),
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = 10)) %>%
  formatCurrency(columns = 7:36,
                 currency = "",
                 mark = ",",
                 digits = 0)
```

# Total counts per sample (i.e. sequencing depth)
```{r seq_depth, fig.width = 10,fig.height = 5}
t1 <- colSums(otu[, 7:ncol(otu)])
t1 <- data.table(SAMPLE_NAME = names(t1),
                 Total = t1)

t2 <- data.table(SAMPLE_NAME = rownames(samples),
                 ID = samples$ID,
                 CAGE = samples$cage,
                 TREATMENT = samples$TREATMENT,
                 Genotype = samples$genotype,
                 WEEK = samples$time)

smpl <- merge(t1,
              t2,
              by = "SAMPLE_NAME")

p1 <- ggplot(smpl,
             aes(x = SAMPLE_NAME,
                 y = Total,
                 fill = TREATMENT,
                 colour = WEEK)) +
  facet_wrap(~ Genotype,
             scale = "free_x") +
  geom_bar(stat = "identity") +
  scale_x_discrete("") +
  scale_y_continuous("Number of Reads") +
  scale_fill_discrete("Treatment") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1)) 
ggplotly(p1)
```

```{r seq_depth_greyscale, , fig.width = 6, fig.height = 6}
tmp <- copy(smpl)
tmp$WEEK <- factor(tmp$WEEK,
                    levels = c("baseline",
                               "week1",
                               "week8"),
                    labels = c("Week 0",
                               "Week 1",
                               "Week 8"))
tmp$Genotype <- factor(tmp$Genotype,
                       levels = c("widetype",
                                  "nrf2KO"),
                       labels = c("Wild Type",
                                  "Nrf2 KO"))
p1 <- ggplot(tmp,
             aes(x = SAMPLE_NAME,
                 y = Total,
                 group = TREATMENT,
                 fill = TREATMENT)) +
  facet_wrap(~ Genotype + WEEK,
             scale = "free_x") +
  geom_bar(stat = "identity",
           color = "black") +
  scale_x_discrete("") +
  scale_y_continuous("Number of Reads") +
  scale_fill_grey("Treatment", 
                  start = 0.1, 
                  end = 1,
                  na.value = "red",
                  aesthetics = "fill") +
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        legend.position = "top")

tiff(filename = "tmp/seq_depth.tiff",
     height = 6,
     width = 6,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

print(p1)
```

# Richness (Alpha diversity)
Shannon index (aka Shannon enthrophy) is calculated as:  
H' = -sum(1 to R)p(i)ln(p(i)) 
When there is exactly 1 type of data (e.g. a single species in the sample), H'=0. The opposite scenario is when there are R>1 species present in the sample in the exact same amounts and H'=ln(R).  
  
Shannon's diversity index was calculated for each sample and ploted over time using the 7,764 from the 13 Phylum above.
  
```{r shannon_vs_depth, fig.height = 5, fig.width = 6}
shannon.ndx <- estimate_richness(ps0,
                                 measures = "Shannon")

shannon.ndx <- data.table(SAMPLE_NAME = rownames(shannon.ndx),
                          shannon.ndx)

smpl <- merge(smpl,
              shannon.ndx,
              by = "SAMPLE_NAME")

p1 <- ggplot(smpl,
             aes(x = Total,
                 y = Shannon,
                 fill = Genotype,
                 shape = WEEK)) +
  geom_point(size = 2) +
  scale_shape_manual(breaks = unique(smpl$WEEK),
                     values = 21:23)

tiff(filename = "tmp/shannon_vs_depth.tiff",
     height = 5,
     width = 6,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1)
```

Even though ***estimate_richness*** function does not adjust for the sequencing depth, there is no correlation between the index and the sample's sequecing depth. Proceed with the comparison.

# Shannon idex over time
```{r richness, fig.width = 8, fig.height = 5}
p1 <- plot_richness(ps0,
                    x = "time", 
                    measures = "Shannon") +
  facet_wrap(~ genotype) +
  geom_line(aes(group = ID),
            color = "black") +
  geom_point(aes(fill = TREATMENT),
             shape = 21,
             size = 3,
             color = "black") +
  scale_x_discrete("") +
  theme(axis.text.x = element_text(angle = 30,
                                   hjust = 1,
                                   vjust = 1))

ggplotly(p = p1,
         tooltip = c("ID",
                     "value"))

p1 <- p1 + 
  scale_fill_discrete("") +
  theme(legend.position = "top")

tiff(filename = "tmp/shannon.tiff",
     height = 4,
     width = 5,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()
```

The plot above suggests that the largest differences in alpha diversity (as measured by Shannon's index) are in genotype.

# Average Shannon Index
```{r avg_shannon_plot, fig.width = 8, fig.height = 5}
# Average shannon index by treatment group
tmp <- copy(smpl)

tmp[, mu := mean(Shannon),
    by = list(TREATMENT,
              Genotype,
              WEEK)]
tmp[, sem := sd(Shannon)/sqrt(.N),
    by = list(TREATMENT,
                 Genotype,
                 WEEK)]
tmp <- unique(tmp[, c("TREATMENT",
                      "Genotype",
                      "WEEK",
                      "mu",
                      "sem")])
tmp$WEEK <- factor(tmp$WEEK,
                   levels = c("baseline",
                              "week1",
                              "week8"),
                   labels = c("Week 0",
                              "Week 1",
                              "Week 8"))
tmp$Genotype <- factor(tmp$Genotype,
                       levels = c("widetype",
                                  "nrf2KO"),
                       labels = c("Wild Type",
                                  "Nrf2 KO"))

p1 <- ggplot(tmp,
             aes(x = WEEK,
                 y = mu,
                 ymin = mu - sem,
                 ymax = mu + sem,
                 fill = TREATMENT,
                 group = TREATMENT)) +
  facet_wrap(~ Genotype) +
  geom_errorbar(position = position_dodge(0.4),
                width = 0.4) +
  geom_line(position = position_dodge(0.4)) +
  geom_point(size = 3,
             shape = 21,
             position = position_dodge(0.4)) +
  scale_x_discrete("") +
  scale_y_continuous("Shannon Index") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1),
        legend.position = "top") 

tiff(filename = "tmp/avg_shannon.tiff",
     height = 5,
     width = 6,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

print(p1)
```
  
Test if the richness changed between the baseline and Week 8.  
  
```{r lm_richness}
smpl$TREATMENT <- factor(smpl$TREATMENT,
                         levels = c("DSS",
                                    "Naive",
                                    "DSS+PEITC",
                                    "DSS+Cranberry"))

tmp <- droplevels(smpl[WEEK != "week1"])

m1 <- lm(Shannon  ~ WEEK*(TREATMENT + Genotype),
         # offset = Total,
         data = tmp)
summary(m1)
```
  
```{r lmer_richness}
m2 <- lmer(Shannon  ~ WEEK*(TREATMENT + Genotype) + (1 | ID),
           # offset = Total,
           data = tmp)
summary(m2)
```

# Calculate change in Shannon index from baseline
```{r delta_shannon, fig.width = 7, fig.height = 5}
dd <- smpl
dd[, delta := Shannon - Shannon[WEEK == "baseline"],
   by = ID]
dd$diff <- paste(dd$WEEK,
                 "-baseline",
                 sep = "")

dd <- dd[WEEK != "baseline",]

p1 <- ggplot(dd,
             aes(x = TREATMENT,
                 y = delta,
                 fill = Genotype)) +
  facet_wrap(~ diff) +
  geom_hline(yintercept = 0,
             linetype = "dashed") +
  geom_point(position = position_dodge(0.3),
             shape = 21,
             size = 3) +
  scale_y_continuous("Shannon Index Percent Change from Baseline") +
  theme(axis.text.x = element_text(angle = 45,
                                   hjust = 1))
print(p1)

dd$TREATMENT <- factor(dd$TREATMENT,
                        levels = c("DSS",
                                   "Naive",
                                   "DSS+PEITC",
                                   "DSS+Cranberry"))
dd$Genotype <- factor(dd$Genotype,
                       levels = c("widetype",
                                  "nrf2KO"))

m1 <- lm(delta ~ TREATMENT*Genotype,
         data = dd)
summary(m1)

# No significant interactions, proceed with 2-way analysis
m2 <- lm(delta ~ TREATMENT + Genotype,
         data = dd)
summary(m2)
```

At Week 8 there was significantly smaller increase of alpha diversity from baseline in Nrf2 KO compared to WT, and in DSS+Cranberry compared to DSS only.

# Load aminoacids
```{r aminoacids_data}
aa <- fread("data_sep2019/sep2019_aminoacids.csv")

aa <- aa[!is.na(ID), ]
aa$ID <- paste0(aa$ID,
                aa$CAGE)

smpl1 <- unique(smpl[, c("ID",
                            "TREATMENT",
                            "Genotype")])
smpl1$ID <- as.character(smpl1$ID)
aa <- merge(smpl1,
            aa,
            by = "ID")
aa[, trt_week := paste(TREATMENT,
                       WEEK,
                       sep = "_")]
aa$trt_week <- factor(aa$trt_week,
                      levels = c("Naive_week2",
                                 "Naive_week6",
                                 "DSS_week2",
                                 "DSS_week6",
                                 "DSS+Cranberry_week2",
                                 "DSS+Cranberry_week6" ,
                                 "DSS+PEITC_week2",
                                 "DSS+PEITC_week6"))
```

# Aminoacids by animal, treatment group and timepoint
```{r aminoacids_plots, fig.height = 4, fig.width = 5}
for (i in 8:(ncol(aa) - 1)) {
  tmp <- aa[, c(1, 3, 7, ncol(aa), i), with = FALSE]
  colnames(tmp)[5] <- "Y"
  p1 <- ggplot(tmp,
               aes(x = trt_week,
                   y = Y,
                   fill = Genotype,
                   group = ID)) +
    geom_line(position = position_dodge(0.3)) +
    geom_point(shape = 21,
               size = 3,
               position = position_dodge(0.3)) +
    scale_x_discrete("") +
    scale_y_continuous(colnames(aa)[i]) +
    theme(axis.text.x = element_text(angle = 45,
                                     hjust = 1))
  # tiff(filename = paste0("tmp/",
  #                        colnames(aa)[i],
  #                        ".tiff"),
  #      height = 4,
  #      width = 5,
  #      units = "in",
  #      res = 600,
  #      compression = "lzw+p")
  # print(p1)
  # graphics.off()
  
  print(p1)
}
```

# Aminoacids over time, group averages
```{r aminoacids_avg_plots, fig.height = 4, fig.width = 5}
out <- list()
for (i in 8:27) {
  tmp <- aa[, c(2, 3, 7, i), 
            with = FALSE]
  names(tmp)[4] <- "val"
  
  tmp[, mu := mean(val,
                   na.rm = TRUE),
      by = list(TREATMENT,
                Genotype,
                WEEK)]
  tmp[, sem := sd(val,
                  na.rm = TRUE)/sqrt(.N),
      by = list(TREATMENT,
                Genotype,
                WEEK)]
  out[[i - 7]] <- data.table(Aminoacid = names(aa)[i],
                             unique(tmp[, c("TREATMENT",
                                            "Genotype",
                                            "WEEK",
                                            "mu",
                                            "sem")]))
}
muaa <- rbindlist(out)
muaa$Aminoacid <- factor(muaa$Aminoacid,
                         levels = unique(muaa$Aminoacid))
muaa$Genotype <- factor(muaa$Genotype,
                        levels = c("widetype",
                                   "nrf2KO"),
                        labels = c("Wild Type",
                                   "Nrf2 KO"))

for (i in 1:nlevels(muaa$Aminoacid)) {
  p1 <- ggplot(muaa[Aminoacid == levels(muaa$Aminoacid)[i], ],
               aes(x = WEEK,
                   y = mu,
                   ymin = mu - sem,
                   ymax = mu + sem,
                   fill = TREATMENT,
                   group = TREATMENT)) +
    facet_wrap(~ Genotype) +
    geom_errorbar(position = position_dodge(0.4),
                  width = 0.4) +
    geom_line(position = position_dodge(0.4)) +
    geom_point(size = 3,
               shape = 21,
               position = position_dodge(0.4)) +
    scale_x_discrete("",
                     breaks = c("week2",
                                "week6"),
                     labels = c("Week 2",
                                "Week 6")) +
    scale_y_continuous(levels(muaa$Aminoacid)[i]) +
    theme(axis.text.x = element_text(angle = 45,
                                     hjust = 1),
          legend.position = "top") 
  
  # tiff(filename = paste0("tmp/avg_",
  #                        levels(muaa$Aminoacid)[i],
  #                        ".tiff"),
  #      height = 5,
  #      width = 6,
  #      units = "in",
  #      res = 600,
  #      compression = "lzw+p")
  # print(p1)
  # graphics.off()
  
  print(p1)
}
```

# Aminoacid data PCA
```{r aminoacids_pca}
dt_pca <- aa[, Alanine:glutamine]

# m1 <- prcomp(dt_pca,
#              center = TRUE,
#              scale. = TRUE)

# m1 <- prcomp(dt_pca,
#              center = FALSE,
#              scale. = FALSE)

m1 <- prcomp(dt_pca)

summary(m1)


# Select PC-s to pliot (PC1 & PC2)
choices <- 1:2
# Scores, i.e. points (df.u)
dt.scr <- data.table(m1$x[, choices])
# Add grouping variable
dt.scr$grp <- aa$trt_week
dt.scr$TREATMENT <- aa$TREATMENT
dt.scr$WEEK <- aa$WEEK
dt.scr

# Loadings, i.e. arrows (df.v)
dt.rot <- as.data.frame(m1$rotation[, choices])
dt.rot$feat <- rownames(dt.rot)
dt.rot <- data.table(dt.rot)
dt.rot

dt.load <- melt.data.table(dt.rot,
                           id.vars = "feat",
                           measure.vars = 1:2,
                           variable.name = "pc",
                           value.name = "loading")
dt.load$feat <- factor(dt.load$feat,
                       levels = unique(dt.load$feat))
# Plot loadings
p0 <- ggplot(data = dt.load,
             aes(x = feat,
                 y = loading)) +
  facet_wrap(~ pc,
             nrow = 2) +
  geom_bar(stat = "identity") +
  ggtitle("PC Loadings") +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))
tiff(filename = "tmp/pc.1.2_loadings.tiff",
     height = 5,
     width = 8,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
```

```{r aminoacids_pca_axes}
# Axis labels
u.axis.labs <- paste(colnames(dt.rot)[1:2], 
                     sprintf('(%0.1f%% explained var.)', 
                             100*m1$sdev[choices]^2/sum(m1$sdev^2)))
u.axis.labs
```

```{r aminoacids_biplot, fig.height = 10, fig.width = 10}
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = grp),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 10*PC1,
                   yend = 10*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 11*PC1,
                y = 11*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Aminoacids") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/aminoacids_biplot.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1)
```

```{r aminoacids_biplot_by_week, fig.height = 10, fig.width = 10}
p2 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = WEEK),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 10*PC1,
                   yend = 10*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               size = 1.2, 
               color = "black") +
  geom_text(aes(x = 11*PC1,
                y = 11*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Week") +
  ggtitle("Biplot of Aminoacids") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/aminoacids_by_week_biplot.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p2)
graphics.off()

ggplotly(p2)
```

```{r aminoacids_biplot_by_trt, fig.height = 10, fig.width = 10}
p2 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = TREATMENT),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 10*PC1,
                   yend = 10*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               size = 1.2, 
               color = "black") +
  geom_text(aes(x = 11*PC1,
                y = 11*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Treatment") +
  ggtitle("Biplot of Aminoacids") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/aminoacids_by_trt_biplot.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p2)
graphics.off()

ggplotly(p2)
```
# Remove unmapped OTUs
The 311 unmapped OTUs were removed from further analysis (with 7,247 OTUs left).
```{r remove_unmapped_otu_phylum}
ps1 <- subset_taxa(ps0, 
                   !is.na(Phylum))
dim(ps1@otu_table@.Data)
```

# Counts at Phylum level
```{r counts_p, warning=FALSE,echo=FALSE,message=FALSE}
counts_p <- counts_by_tax_rank(dt1 = otu,
                               aggr_by = "Phylum")
setorder(counts_p, -`190919-01`)
datatable(counts_p,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(counts_p))) %>%
  formatCurrency(columns = 2:ncol(counts_p),
                 currency = "",
                 mark = ",",
                 digits = 0)
```

# Relative abundance (%) at Phylum level
```{r ra_p, warning=FALSE,echo=FALSE,message=FALSE}
ra_p <- ra_by_tax_rank(counts = counts_p,
                       pct = FALSE,
                       digit = 4)

datatable(ra_p,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(ra_p))) %>%
  formatPercentage(columns = 2:ncol(counts_p),
                   digits = 2)
```

Remove phyla with relative abundance of >= 1% in less than 10% of samples.

```{r prev_p}
t1 <- data.table(Phylum = ra_p$Phylum,
                 `Number of Samples` = rowSums(ra_p[, 2:ncol(ra_p)] >= 0.01))
t1$`Percent Samples` <-  t1$`Number of Samples`/72

setorder(t1, -`Number of Samples`)
datatable(t1,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = nrow(t1))) %>%
  formatPercentage(columns = 3,
                   digits = 1)
```

We will remove Chlamydiae from this analysis.

```{r keep_phyla, warning=FALSE,echo=FALSE,message=FALSE}
keep_p <- t1$Phylum[t1$`Percent Samples` >= 0.1]
# # Keep all
# keep_p <- t1$Phylum

paste0(keep_p, collapse = ", ")

ps1 <- subset_taxa(ps0, 
                   Phylum %in% keep_p )
otu1 <- data.table(ps1@tax_table@.Data,
                   t(ps1@otu_table@.Data))

datatable(head(otu1, 10),
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = 10)) %>%
  formatCurrency(columns = 7:ncol(otu1),
                 currency = "",
                 mark = ",",
                 digits = 0)
```

7,224 OTUs, down from 7,247 OTUs in the previous table.

# Relative Abundance in Samples at Different Taxonomic Ranks
## 1. Class
```{r counts_c, warning=FALSE,echo=FALSE,message=FALSE,fig.width=15,fig.height=15}
counts_c <- counts_by_tax_rank(dt1 = otu1,
                               aggr_by = "Class")
ra_c <- ra_by_tax_rank(counts_c)

tax.ranks <- unique(otu1[, c("Phylum",
                             "Class")])

ra_c <- merge(tax.ranks,
              ra_c,
              by = "Class")

total <- rowSums(ra_c[, 3:ncol(ra_c)])

ra_c$Class <- factor(ra_c$Class,
                     levels = ra_c$Class[order(total)])

ra_c$Phylum <- factor(ra_c$Phylum,
                      levels = unique(ra_c$Phylum[order(total)]))
tmp <- melt.data.table(data = ra_c,
                       id.vars = 1:2,
                       measure.vars = 3:ncol(counts_c),
                       variable.name = "SAMPLE_NAME",
                       value.name = "RA")

tmp <- merge(data.table(SAMPLE_NAME = smpl$SAMPLE_NAME,
                        WEEK = smpl$WEEK,
                        TREATMENT = smpl$TREATMENT,
                        Genotype = smpl$Genotype),
             tmp,
             by = "SAMPLE_NAME")

# Plot samples
p1 <- ggplot(tmp,
             aes(x = SAMPLE_NAME,
                 y = RA,
                 fill = Class,
                 color = Phylum)) +
  facet_wrap(~ WEEK + TREATMENT + Genotype,
             scales = "free_x",
             nrow = 3) +
  geom_bar(stat = "identity") +
  scale_x_discrete("") +
  scale_y_continuous(expand = c(0, 0)) +
  theme(axis.text.x = element_text(angle = 0,
                                   hjust = 1))
ggplotly(p1+
           theme(legend.position = "none"))
```

```{r means_c, echo = FALSE, warning = FALSE, message = FALSE}
lra <- ra_melt(ra = ra_c,
               samples = smpl,
               sample_name = "SAMPLE_NAME")

mu <- data.table(aggregate(lra$RA,
                           by = list(Week = lra$WEEK,
                                     Treatment = lra$TREATMENT,
                                     Genotype = lra$Genotype,
                                     Class = lra$Class),
                           FUN = "mean"))
mu[, total := sum(x),
   by = "Class"]
ul <- unique(mu[, c("Class", 
                    "total")])
ul <- ul[order(total),]
mu$Class <- factor(mu$Class,
                   level = ul$Class)
mu$total <- NULL

datatable(mu,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = 10,
                         order = list(list(3, 'desc')))) %>%
  formatCurrency(columns = 5,
                 currency = "",
                 mark = ",",
                 digits = 2)
```

# PCA at Class level
```{r pca_c_p0, fig.width = 9, fig.height = 7}
dt_pca <- t(ra_c[, 3:ncol(ra_c)])
colnames(dt_pca) <- ra_c$Class
dt_pca_c <- data.table(SAMPLE_NAME = rownames(dt_pca),
                       dt_pca)
dt_pca_c <- merge(smpl,
                  dt_pca_c,
                  by = "SAMPLE_NAME")

# m1 <- prcomp(dt_pca,
#              center = TRUE,
#              scale. = TRUE)

# m1 <- prcomp(dt_pca,
#              center = FALSE,
#              scale. = FALSE)

m1 <- prcomp(dt_pca)
summary(m1)


# Select PC-s to pliot (PC1 & PC2)
choices <- 1:2
# Scores, i.e. points (df.u)
dt.scr <- data.table(m1$x[, choices])
# Add grouping variable
dt.scr$grp <- paste(dt_pca_c$TREATMENT,
                    dt_pca_c$WEEK,
                    dt_pca_c$Genotype)

dt.scr$TREATMENT <- dt_pca_c$TREATMENT
dt.scr$WEEK <- dt_pca_c$WEEK
dt.scr$Genotype <- dt_pca_c$Genotype
dt.scr

# Loadings, i.e. arrows (df.v)
dt.rot <- as.data.frame(m1$rotation[, choices])
dt.rot$feat <- rownames(dt.rot)
dt.rot <- data.table(dt.rot)
dt.rot

dt.load <- melt.data.table(dt.rot,
                           id.vars = "feat",
                           measure.vars = 1:2,
                           variable.name = "pc",
                           value.name = "loading")
dt.load$feat <- factor(dt.load$feat,
                       levels = unique(dt.load$feat))
# Plot loadings
p0 <- ggplot(data = dt.load,
             aes(x = feat,
                 y = loading)) +
  facet_wrap(~ pc,
             nrow = 2) +
  geom_bar(stat = "identity") +
  ggtitle("PC Loadings") +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))
tiff(filename = "tmp/pc.1.2_loadings_class.tiff",
     height = 5,
     width = 8,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
```

```{r pca_axes_c}
# Axis labels
u.axis.labs <- paste(colnames(dt.rot)[1:2], 
                     sprintf('(%0.1f%% explained var.)', 
                             100*m1$sdev[choices]^2/sum(m1$sdev^2)))
u.axis.labs
```

```{r biplot_grp_c, fig.height = 10, fig.width = 10}
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = grp),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Classes of Bacteria") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20)) 
tiff(filename = "tmp/class_biplot_grp.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1)
```

```{r biplot_grp_with_avg_c, fig.height = 10, fig.width = 10}
# Find centers of each group
grpg <- "grp"
var1 <- eval(parse(text = paste("dt.scr$",
                                grpg,
                                sep = "")))

cntr <- data.table(Group = unique(var1),
                   PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(var1),
                                   FUN = "mean")$x,
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = Group,
                          colour = Group),
                      alpha = 0.5,
                      size = 3) +
  scale_color_discrete(guide = FALSE) +
  theme(legend.position = "none")
print(p2)
```

```{r biplot_genotype_c, fig.height = 10, fig.width = 10}
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Classes of Bacteria") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/class_biplot_genotype.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1)
```

```{r biplot_genotype_with_avg_c, fig.height = 10, fig.width = 10}
# Find centers of each group
grpg <- "Genotype"
var1 <- eval(parse(text = paste("dt.scr$",
                                grpg,
                                sep = "")))

cntr <- data.table(Group = unique(var1),
                   PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(var1),
                                   FUN = "mean")$x,
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = Group,
                          colour = Group),
                      alpha = 0.5,
                      size = 3) +
  scale_color_discrete(guide = FALSE) +
  theme(legend.position = "none")
print(p2)
```

```{r biplot_week_c, fig.height = 10, fig.width = 10}
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]
p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = WEEK),
             shape = 21,
             size = 2,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_discrete(name = "Group") +
  ggtitle("Biplot of Classes of Bacteria") +
  theme(plot.title = element_text(hjust = 0.5,
                                  size = 20))
tiff(filename = "tmp/class_biplot_week.tiff",
     height = 10,
     width = 10,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1)
```

```{r biplot_week_with_avg_c, fig.height = 10, fig.width = 10}
# Find centers of each group
grpg <- "WEEK"
var1 <- eval(parse(text = paste("dt.scr$",
                                grpg,
                                sep = "")))

cntr <- data.table(Group = unique(var1),
                   PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(var1),
                                   FUN = "mean")$x,
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = Group,
                          colour = Group),
                      alpha = 0.5,
                      size = 3) +
  scale_color_discrete(guide = FALSE) +
  theme(legend.position = "none")
print(p2)
```

```{r biplot_trt_c, fig.height = 8, fig.width = 8}
# Based on Figure p0, keep only a few variables with high loadings in PC1 and PC2----
# var.keep.ndx <- which(dt.rot$feat %in% c(...))
# Or select all
# var.keep.ndx <- 3:ncol(dt1)
# Use dt.rot[var.keep.ndx,] and dt.rot$feat[var.keep.ndx]

p1 <- ggplot(data = dt.rot,
             aes(x = PC1,
                 y = PC2)) +
  coord_equal() +
  geom_point(data = dt.scr,
             aes(fill = TREATMENT),
             shape = 21,
             size = 3,
             alpha = 0.5) +
  geom_segment(aes(x = 0,
                   y = 0,
                   xend = 40*PC1,
                   yend = 40*PC2),
               arrow = arrow(length = unit(1/2, 'picas')),
               # size = 1, 
               color = "black") +
  geom_text(aes(x = 44*PC1,
                y = 44*PC2,
                label = dt.rot$feat),
            # size = 5,
            hjust = 0.5) +
  scale_x_continuous(u.axis.labs[1]) +
  scale_y_continuous(u.axis.labs[2]) +
  scale_fill_manual(name = "Treatment",
                    breaks = levels(dt.scr$TREATMENT),
                                    values = c("red",
                                               "green",
                                               "orange",
                                               "blue")) +
                      ggtitle("Biplot of Classes of Bacteria") +
                      theme(plot.title = element_text(hjust = 0.5,
                                                      size = 20))
                    ggplotly(p1)
```

```{r biplot_trt_with_avg_c, fig.height = 8, fig.width = 8}
# Find centers of each group
# grpg <- "TREATMENT"
# var1 <- eval(parse(text = paste("dt.scr$",
#                                 grpg,
#                                 sep = "")))

# cntr <- data.table(Group = levels(var1),
#                    PC1 = aggregate(x = dt.scr$PC1,
#                                    by = list(var1),
#                                    FUN = "mean")$x,
#                    PC2 = aggregate(x = dt.scr$PC2,
#                                    by = list(var1),
#                                    FUN = "mean")$x)

cntr <- data.table(PC1 = aggregate(x = dt.scr$PC1,
                                   by = list(dt.scr$TREATMENT),
                                   FUN = "mean"),
                   PC2 = aggregate(x = dt.scr$PC2,
                                   by = list(var1),
                                   FUN = "mean")$x)
colnames(cntr) <- c("TREATMENT",
                    "PC1",
                    "PC2")

p2 <- p1 + geom_label(data = cntr,
                      aes(x = PC1,
                          y = PC2,
                          label = TREATMENT,
                          colour = TREATMENT),
                      alpha = 0.5,
                      size = 3) +
  scale_color_manual(guide = FALSE,
                    breaks = levels(cntr$TREATMENT),
                                    values = c("red",
                                               "green",
                                               "orange",
                                               "blue")) +
  theme(legend.position = "none")

tiff(filename = "tmp/class_biplot_trt.tiff",
     height = 8,
     width = 8,
     units = 'in',
     res = 300,
     compression = "lzw+p")
print(p2)
graphics.off()

print(p2)
```

```{r multinom_c}
# Covariates only
m1 <- multinom(TREATMENT ~ WEEK + Genotype,
               data = dt.scr)
summary(m1)

prd1 <- predict(m1)

t1 <- table(Predicted = prd1,
            Observed = dt.scr$TREATMENT)

# PC1 alone
m2 <- multinom(TREATMENT ~ PC1,
               data = dt.scr)
summary(m2)

prd2 <- predict(m2)

t2 <- table(Predicted = prd2,
            Observed = dt.scr$TREATMENT)

# PC1 with covariates
m3 <- multinom(TREATMENT ~ PC1 + WEEK + Genotype,
               data = dt.scr)
summary(m3)

prd3 <- predict(m3)

t3 <- table(Predicted = prd3,
            Observed = dt.scr$TREATMENT)

# PC1 + PC2 with covariates
m4 <- multinom(TREATMENT ~ PC1 + PC2 + WEEK + Genotype,
               data = dt.scr)
summary(m4)

prd4 <- predict(m4)

t4 <- table(Predicted = prd4,
            Observed = dt.scr$TREATMENT)

# Confusion tables
datatable(cbind(t1), 
          caption = "Covariates Only")
datatable(cbind(t2),
          caption = "PC1 Only")
datatable(cbind(t3),
          caption = "PC1 with covariates")
datatable(cbind(t4),
          caption = "PC1 + PC2 with covariates")

# Compare models
anova(m1, m3)
anova(m2, m3)
anova(m4, m3)
```

The results suggest that:  
a. Covariates alone (timepoint and genotype) cannot explain the difference between treatments.  
b. Principal Component 1 (PC1) can explain the differences in relative abundance of classes in the samples. The model does not improve significantly by adding the covariates or the PC2. However, the covariates should stay in the model for adjustment, and PC2 slightly improves the predictions.  
c. The full model (PC1 + PC2 + Week + Genotype) correctly classifies 12 out of 18 Naive samples, and 12 out of 18 DSS+Cranburry samples.  
  
Continuing the same analysis at Order, Family and Genus levels.

DO ROC AUC NEST!!!
```{r roc_FROM_KEAP1NRF2_STUDY, height = 7,fig.width = 8}
# # Output probobilities----
# prd1.1 <- predict(m1,
#                   type = "probs")
# prd1.1 <- data.table(ID = df.u$ID,
#                      Treatment = df.u$Treatment,
#                      round(prd1.1,
#                            4))
# 
# prd2.1 <- predict(m2,
#                   type = "probs")
# prd2.1 <- data.table(ID = df.u$ID,
#                      Treatment = df.u$Treatment,
#                      round(prd2.1,
#                            4))
# 
# # Sensitivity/Specificity
# # Tresholds
# trhd <- seq(0, 1, by = 0.01)
# 
# out1 <- list()
# for (i in 1:length(trhd)) {
#   tmp <- apply(prd1.1[, -c(1:2)],
#                MARGIN = 2,
#                FUN = function(a) {
#                  return(a >= trhd[i])
#                })
#   tmp2 <- apply(X = tmp,
#                 MARGIN = 2,
#                 FUN = function(a) {
#                   aggregate(x = a,
#                             by = list(prd1.1$Treatment),
#                             FUN = sum)$x
#                 })
#   tmp2
#   out1[[i]] <- c(sens = sum(diag(tmp2))/nrow(prd1.1),
#                  spec = (sum(tmp2[upper.tri(tmp2)]) + 
#                            sum(tmp2[lower.tri(tmp2)]))/(nrow(tmp)*(ncol(tmp) - 1)))
# }
# out1 <- data.table(do.call("rbind", out1))
# out1 <- unique(out1)
# 
# out2 <- list()
# for (i in 1:length(trhd)) {
#   tmp <- apply(prd2.1[, -c(1:2)],
#                MARGIN = 2,
#                FUN = function(a) {
#                  return(a >= trhd[i])
#                })
#   tmp2 <- apply(X = tmp,
#                 MARGIN = 2,
#                 FUN = function(a) {
#                   aggregate(x = a,
#                             by = list(prd2.1$Treatment),
#                             FUN = sum)$x
#                 })
#   out2[[i]] <- c(sens = sum(diag(tmp2))/nrow(prd2.1),
#                  spec = (sum(tmp2[upper.tri(tmp2)]) + 
#                            sum(tmp2[lower.tri(tmp2)]))/(nrow(tmp)*(ncol(tmp) - 1)))
# }
# out2 <- data.table(do.call("rbind", out2))
# out2 <- unique(out2)
# 
# # ROC
# roc1 <- auc(x = out1$spec,
#             y = out1$sens,
#             from = 0,
#             to = 1)
# 
# roc2 <- auc(x = out2$spec,
#             y = out2$sens,
#             from = 0,
#             to = 1)
# 
# # ROC plot
# plot(out1$sens ~ out1$spec,
#      type = "l",
#      xlim = c(0, 1),
#      ylim = c(0, 1),
#      xlab = "1 - Specificity",
#      ylab = "Sensitivity",
#      col = "blue")
# lines(out2$sens ~ out2$spec,
#       col = "red")
# text(x = c(0.8, 0.8),
#      y = c(0.2, 0.3),
#      label = c(paste("ROC(PC1) = ",
#                      round(roc1, 
#                            3)),
#                paste("\nROC(PC1+PC2) = ",
#                      round(roc2,
#                            3))),
#      col = c("blue",
#              "red"))
# abline(0, 1, lty = 2)
```

```{r means_c_p0, fig.width = 9, fig.height = 7}
mu$Trt_Genotype <- factor(paste(mu$Treatment,
                          mu$Genotype,
                          sep = "_"))

p0 <- ggplot(mu,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Class,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_class_over_time.tiff",
     height = 5,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
```

```{r means_c_p1, fig.height = 5, fig.width = 9}
p1 <- ggplot(mu,
             aes(x = x,
                 y = Class,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)") 

tiff(filename = "tmp/wt_class_ra.tiff",
     height = 4,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1+
  theme(legend.position = "none"))
```

## 2. Order
```{r counts_o, warning=FALSE,echo=FALSE,message=FALSE,fig.width=15,fig.height=15}
counts_o <- counts_by_tax_rank(dt1 = otu1,
                               aggr_by = "Order")
ra_o <- ra_by_tax_rank(counts_o)

tax.ranks <- unique(otu1[, c("Phylum",
                             "Order")])

ra_o <- merge(tax.ranks,
              ra_o,
              by = "Order")

total <- rowSums(ra_o[, 3:ncol(ra_o)])

ra_o$Order <- factor(ra_o$Order,
                     levels = ra_o$Order[order(total)])

ra_o$Phylum <- factor(ra_o$Phylum,
                      levels = unique(ra_o$Phylum[order(total)]))
tmp <- melt.data.table(data = ra_o,
                       id.vars = 1:2,
                       measure.vars = 3:ncol(counts_o),
                       variable.name = "SAMPLE_NAME",
                       value.name = "RA")

tmp <- merge(data.table(SAMPLE_NAME = smpl$SAMPLE_NAME,
                        WEEK = smpl$WEEK,
                        TREATMENT = smpl$TREATMENT,
                        Genotype = smpl$Genotype),
             tmp,
             by = "SAMPLE_NAME")

# Plot samples
p1 <- ggplot(tmp,
             aes(x = SAMPLE_NAME,
                 y = RA,
                 fill = Order,
                 color = Phylum)) +
  facet_wrap(~ WEEK + TREATMENT + Genotype,
             scales = "free_x",
             nrow = 3) +
  geom_bar(stat = "identity") +
  scale_x_discrete("") +
  scale_y_continuous(expand = c(0, 0)) +
  theme(axis.text.x = element_text(angle = 0,
                                   hjust = 1))
ggplotly(p1+
  theme(axis.text.x = element_text(angle = 0,
                                   hjust = 1),
        legend.position = "none"))
```

```{r means_o, echo = FALSE, warning = FALSE, message = FALSE}
lra <- ra_melt(ra = ra_o,
               samples = smpl,
               sample_name = "SAMPLE_NAME")

mu <- data.table(aggregate(lra$RA,
                           by = list(Week = lra$WEEK,
                                     Treatment = lra$TREATMENT,
                                     Genotype = lra$Genotype,
                                     Order = lra$Order),
                           FUN = "mean"))
mu[, total := sum(x),
   by = "Order"]
ul <- unique(mu[, c("Order", 
                    "total")])
ul <- ul[order(total),]
mu$Order <- factor(mu$Order,
                   level = ul$Order)
mu$total <- NULL

datatable(mu,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = 10,
                         order = list(list(3, 'desc')))) %>%
  formatCurrency(columns = 5,
                 currency = "",
                 mark = ",",
                 digits = 2)
```

```{r means_o_p0, fig.width = 9, fig.height = 7}
mu$Trt_Genotype <- factor(paste(mu$Treatment,
                                mu$Genotype,
                                sep = "_"))

p0 <- ggplot(mu,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Order,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_Order_over_time.tiff",
     height = 5,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
```

```{r means_o_p1, fig.height = 7, fig.width = 7}
p1 <- ggplot(mu,
             aes(x = x,
                 y = Order,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)")

tiff(filename = "tmp/wt_Order_ra.tiff",
     height = 4,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1 +
           theme(legend.position = "none"))
```

## 3. Family
```{r counts_f, warning=FALSE,echo=FALSE,message=FALSE,fig.width=15,fig.height=15}
counts_f <- counts_by_tax_rank(dt1 = otu1,
                               aggr_by = "Family")
ra_f <- ra_by_tax_rank(counts_f)

tax.ranks <- unique(otu1[, c("Phylum",
                             "Family")])

ra_f <- merge(tax.ranks,
              ra_f,
              by = "Family")

total <- rowSums(ra_f[, 3:ncol(ra_f)])

ra_f$Family <- factor(ra_f$Family,
                     levels = ra_f$Family[order(total)])

ra_f$Phylum <- factor(ra_f$Phylum,
                      levels = unique(ra_f$Phylum[order(total)]))
tmp <- melt.data.table(data = ra_f,
                       id.vars = 1:2,
                       measure.vars = 3:ncol(counts_f),
                       variable.name = "SAMPLE_NAME",
                       value.name = "RA")

tmp <- merge(data.table(SAMPLE_NAME = smpl$SAMPLE_NAME,
                        WEEK = smpl$WEEK,
                        TREATMENT = smpl$TREATMENT,
                        Genotype = smpl$Genotype),
             tmp,
             by = "SAMPLE_NAME")

# Plot samples
p1 <- ggplot(tmp,
             aes(x = SAMPLE_NAME,
                 y = RA,
                 fill = Family,
                 color = Phylum)) +
  facet_wrap(~ WEEK + TREATMENT + Genotype,
             scales = "free_x",
             nrow = 3) +
  geom_bar(stat = "identity") +
  scale_x_discrete("") +
  scale_y_continuous(expand = c(0, 0)) +
  theme(axis.text.x = element_text(angle = 0,
                                   hjust = 1))
ggplotly(p1+
           theme(legend.position = "none"))
```

```{r means_f, echo = FALSE, warning = FALSE, message = FALSE}
lra <- ra_melt(ra = ra_f,
               samples = smpl,
               sample_name = "SAMPLE_NAME")

mu <- data.table(aggregate(lra$RA,
                           by = list(Week = lra$WEEK,
                                     Treatment = lra$TREATMENT,
                                     Genotype = lra$Genotype,
                                     Family = lra$Family),
                           FUN = "mean"))
mu[, total := sum(x),
   by = "Family"]
ul <- unique(mu[, c("Family", 
                    "total")])
ul <- ul[order(total),]
mu$Family <- factor(mu$Family,
                   level = ul$Family)
mu$total <- NULL

datatable(mu,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = 10,
                         Family = list(list(3, 'desc')))) %>%
  formatCurrency(columns = 5,
                 currency = "",
                 mark = ",",
                 digits = 2)
```

NOTE: only the first 24 families had large enough counts - ploting only them.  
  
```{r means_f_p0, fig.width = 9, fig.height = 7}
mu$Trt_Genotype <- factor(paste(mu$Treatment,
                                mu$Genotype,
                                sep = "_"))
mu1 <- droplevels(mu[Family %in% levels(mu$Family)[nlevels(mu$Family):(nlevels(mu$Family) - 24)], ])

p0 <- ggplot(mu1,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Family,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_Family_over_time.tiff",
     height = 7,
     width = 9,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0)
```

```{r means_f_p1, fig.height = 5, fig.width = 9}
p1 <- ggplot(mu1,
             aes(x = x,
                 y = Family,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)") +
  theme(legend.position = "top")

tiff(filename = "tmp/wt_Family_ra.tiff",
     height = 4,
     width = 7,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1+
           theme(legend.position = "none"))
```

## 4. Genus
```{r counts_g, warning=FALSE,echo=FALSE,message=FALSE,fig.width=15,fig.height=15}
counts_g <- counts_by_tax_rank(dt1 = otu1,
                               aggr_by = "Genus")
ra_g <- ra_by_tax_rank(counts_g)

tax.ranks <- unique(otu1[, c("Phylum",
                             "Genus")])

ra_g <- merge(tax.ranks,
              ra_g,
              by = "Genus")

total <- rowSums(ra_g[, 3:ncol(ra_g)])

ra_g$Genus <- factor(ra_g$Genus,
                     levels = ra_g$Genus[order(total)])

ra_g$Phylum <- factor(ra_g$Phylum,
                      levels = unique(ra_g$Phylum[order(total)]))
tmp <- melt.data.table(data = ra_g,
                       id.vars = 1:2,
                       measure.vars = 3:ncol(counts_g),
                       variable.name = "SAMPLE_NAME",
                       value.name = "RA")

tmp <- merge(data.table(SAMPLE_NAME = smpl$SAMPLE_NAME,
                        WEEK = smpl$WEEK,
                        TREATMENT = smpl$TREATMENT,
                        Genotype = smpl$Genotype),
             tmp,
             by = "SAMPLE_NAME")

# Plot samples
p1 <- ggplot(tmp,
             aes(x = SAMPLE_NAME,
                 y = RA,
                 fill = Genus,
                 color = Phylum)) +
  facet_wrap(~ WEEK + TREATMENT + Genotype,
             scales = "free_x",
             nrow = 3) +
  geom_bar(stat = "identity") +
  scale_x_discrete("") +
  scale_y_continuous(expand = c(0, 0)) +
  theme(axis.text.x = element_text(angle = 0,
                                   hjust = 1))
ggplotly(p1+
           theme(legend.position = "none"))
```

```{r means_g, echo = FALSE, warning = FALSE, message = FALSE}
lra <- ra_melt(ra = ra_g,
               samples = smpl,
               sample_name = "SAMPLE_NAME")

mu <- data.table(aggregate(lra$RA,
                           by = list(Week = lra$WEEK,
                                     Treatment = lra$TREATMENT,
                                     Genotype = lra$Genotype,
                                     Genus = lra$Genus),
                           FUN = "mean"))
mu[, total := sum(x),
   by = "Genus"]
ul <- unique(mu[, c("Genus", 
                    "total")])
ul <- ul[order(total),]
mu$Genus <- factor(mu$Genus,
                   level = ul$Genus)
mu$total <- NULL

datatable(mu,
          rownames = FALSE,
          caption = "Taxonomic  count table",
          class = "cell-border stripe",
          options = list(search = FALSE,
                         pageLength = 10,
                         Genus = list(list(3, 'desc')))) %>%
  formatCurrency(columns = 5,
                 currency = "",
                 mark = ",",
                 digits = 2)
```

```{r means_g_p0, fig.width = 9, fig.height = 7}
mu$Trt_Genotype <- factor(paste(mu$Treatment,
                                mu$Genotype,
                                sep = "_"))
mu1 <- droplevels(mu[Genus %in% levels(mu$Genus)[nlevels(mu$Genus):(nlevels(mu$Genus) - 35)], ])

p0 <- ggplot(mu1,
             aes(x = Week,
                 y = x,
                 group = Trt_Genotype)) +
  facet_wrap(~ Genus,
             scale = "free_y") +
  geom_line(position = position_dodge(0.3)) +
  geom_point(aes(fill = Trt_Genotype),
             shape = 21,
             size = 2,
             alpha = 0.5,
             position = position_dodge(0.3)) +
  scale_x_discrete("") +
  scale_y_continuous("Relative Abundance (%)") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 45,
                                   hjust = 1))

tiff(filename = "tmp/wt_Genus_over_time.tiff",
     height = 9,
     width = 12,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p0)
graphics.off()

print(p0+
           theme(legend.position = "none"))
```

```{r means_g_p1, fig.height = 9, fig.width = 9}
p1 <- ggplot(mu1,
             aes(x = x,
                 y = Genus,
                 color = Trt_Genotype,
                 shape = Week)) +
  geom_point(size = 3,
             alpha = 0.5) +
  geom_vline(xintercept = 1,
             linetype = "dashed") +
  scale_x_continuous("Relative Abundance (%)") +
  theme(legend.position = "top")

tiff(filename = "tmp/wt_Genus_ra.tiff",
     height = 9,
     width = 9,
     units = "in",
     res = 600,
     compression = "lzw+p")
print(p1)
graphics.off()

ggplotly(p1+
           theme(legend.position = "none"))
```

# Session Information
```{r info,eval=TRUE}
sessionInfo()
```